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Abstract .— One of the goals of phylogenetic research is to find the species tree describing 
the evolutionary history of a set of species. But the trees derived from geneti data with the 
help of tree inference methods are gene trees that need not coincide with the species tree. 
This can for example happen when so-called deep coalescence events take place. It is also 
known that species trees can differ from their most likely gene trees. Therefore, as a means 
to find the species tree, it has been suggested to use subtrees of the gene trees, for example 
triples, and to puzzle them together in order to find the species tree. In this paper, we will 
show that this approach may lead to wrong trees regarding the minimum deep coalescence 
criterion (MDC). In particular, we present an example in which the optimal MDC tree is 


unique, but none of its triple subtrees fulfills the MDC criterion. In this sense, MDC is a 
non-hereditary tree reconstruction method. 

(Keywords: minimum deep coalescence (MDC), parsimony, incomplete lineage sorting, 
subtree ) 


In phylogenetics, there are basically two types of tree reconstruction problems: On 
the one hand, for given sequence data like DNA, RNA or protein data, one seeks to 
reconstruct the phylogenetic tree which explains these data best. There are several 
methods to do this, for example Maximum Parsimony (MP), Maximum Likelihood (ML), 
distance methods or Bayesian methods. It has long been known that different met hods 


may lead to different trees (c.f. 


H uson and Steel ( 20041 ). 


Fischer and Tha tte (120101) ). 


However, even the same method can l ead to d if ferent trees for_ the same set of species when 
it is applied to different genes (c.f. Degnaii a nd Rose nberg (120061 )L In this case, we have a 


set of conflicting gene trees. It is therefore not trivial to find the species tree, i.e. the tree 
describing the true evolutionary history of the species under investigation. So the second 
tree reconstruction problem is concerned with estimating the species tree from a set of 
input gene trees. 

Different gene trees based on the same species tree can have various known causes. 
For example, horizontal gene transfer may l ead to gene t rees which contradict each other. 


In this case, a phylogenetic network (iKunin et ah 


(120051) ) would be more appropriate than 


a tree in order to describe the underlying process. But there are also cases when evolution 
is treelike and still some gene trees do not coincide with the species tree. This can happen 














when so-called incomplete lineage sorting takes place. For example, consider Figure [TJ If 
tree S on the right hand side of this figure is the underlying species tree and T is a gene 
tree, then there are two instances (highlighted by the small arrows) where the coalescence 
of individual lineages of T happens further towards the root of the tree than suggested by 
S. This means that we have incomplete lineage sorting whenever two ancestral lineages in 
the gene tree fail t o coalesce before (looking backwards in the tree) more recent speciation 


events take place (IMaddisonl (I1997I 11. This problem, which is more likely when branches of 


the species tree are short and the population size of the respective species is large, is 
therefore also known as deep coalescence. One way to estimate the species tree is thus to 
seek the Minimum Deep Coalescence (MDC) tree, i.e. the (not necessarily unique) tree 
which requires a minimum number of deep coalescence events to explain the given gene 
trees. 

The concept of MDC was first introduced by IMaddisonl (119971 1. who also realized 


that this concept is in some sense a parsimony concept. Note that parsimony in 
phylogenetics usually refers to Maximum Parsimony (MP), a method of finding the tree 
which requires the smallest number of mutations to explain a given sequence alignment by, 
simpl y speaki ng, fin ding a consensus between all input sites in the alignment 


(IB ruen and Bryant 


(120081 )1. What Maddison means, though, is similar in the sense that a 


consensus is aimed at, but in this case a consensus between different gene trees. However, 


(c.f. 


Fclsenstein 

(1978) 

Fischer 

(2012), 


Fischer a nd Th atte ([201 ()[) ). One of those 


drawbacks is sometimes referred to as non-heredity: In fact, for a given DNA alignment, it 


can happen that you have a uniq ue MP 


corresponding subalignment (c.f. 


ree, but non of its subtrees is MP for the 


Eischer (j2012|)). So in this regard, it is an interesting 


question to see if MDC as a parsimony method in a different context, is hereditary or 
non-hereditary. 























In particular, it has been proposed (c.f. Deg n an and Rosenbe rg (120061 )1 that in 
order to estimate the species tree, instead of estimating gene trees for each set of genes and 
estimating the species tree from those, smaller gene trees like e.g. triples, i.e. trees on only 
three species at a time, should be reconstructed and then puzzled together in order to build 
the estimated speci es tree. This method is well-known in phylogenetics and often referred 


to as tree puzzling (jEwi ng et al. 


( 120081 )). 


Note that MDC is NP-hard ( Zhangl ( 2011 )). i.e. it is hard to find a tree which 


minimizes the sum of the MDC scores of the input gene trees. However, this does not 
immediately imply that MDC is non-hereditary: even if it were hereditary, i.e. even if each 
MDC tree always had an MDC subtree, it would not be clear which one to choose. 

Therefore, in this paper we investigate the question whether MDC trees are 
hereditary. In particular, we will analyze whether a unique MDC tree necessarily has at 
least one MDC subtree. We answer this question negatively even for four taxa: We present 
an example of a set of gene trees on four taxa which have a unique MDC tree, which in 
turn has no subtree of size three that is MDC for the input gene trees’ subtrees of size 
three. This proves that MDC trees cannot be reconstructed from subtrees of smaller sizes. 
Moreover, the main i dea u sed in this paper to derive our result is similar to the ideas 
presented in Fischer ( 20121 ) in the sense that we construct a syste m of inequalities based on 


a formula to calculate the MDC score proven by 


Than and Na kleh ( 20091 ) and check if the 


solution space is empty or not. We believe that this approach can also be useful for 
tackling other questions concerning MDC. 


Preliminaries 


We need to introduce some concepts and notations before we can present our results. In 
this paper, we discuss so-called rooted binary phylogenetic trees on a set X of species or 


























taxa. Recall that a tree is a connected acyclic graph, and a phylogenetic tree has its leaves 
labelled by the names of the underlying set of species. Such a tree is rooted and binary if 
all internal nodes have degree 3, except for one specific node representing the most recent 
common ancestor of all species under investigation, which is called the root and has degree 
2. When there is no ambiguity, we will refer to rooted binary phylogenetic trees only as 
trees for short. For a tree T and a node v in T, we denote by T{v) the clade of T rooted at 
v. The leaves of this clade are called cluster , and this cluster is denoted by Ct(v). When 
the node v on which a cluster t is pending is not explicitly stated, such a leaf set can also 
be denoted by C(t). In a tree T on species set X, the most recent common ancestor , or 
MRCA for short, of a set y C X of species is a node u in T such that all leaves in Y are 
descendants of u and all other nodes with this property are closer to the root than u. So 
the MRCA of a cluster Ct{v) is v. In the special case where Y = X, the MRCA is the root 
of T. 

When we consider a subtree T' of a tree T, this means we are restricting the leaf set 
X to the leaf set 7 cl of T', i.e. we delete all leaves which are not in X as well as edges 
leading to these leaves and, if this implies that the root no longer has degree 2 , we also 
delete the root and the remaining edge leading to the root. Then we suppress all resulting 
nodes of degree 2 (other than possibly the MRCA of Y in case that the original root has 
been deleted, because this MRCA is then the new root of T') in order to obtain a binary 
tree again. In case a tree T' is derived from T in this manner, we write T' = T|y, i.e. T' is 
regarded as the restriction of T to Y. Note that a clade is a particular kind of subtree, 
namely a subtree which is induced by a node v of T. For example, consider tree T\ from 
Figure El Then, considering Figure 3 } = T\ , T{ i ) 2 ,4} = Tf, T{i ) 3i4 } = T\ and 

T{ 2 , 3 , 4 } = T\ , but only T\ is a clade of 7\, and thus T\ contains the cluster {1,2, 3}, but not, 
say, {1,2,4}. 

Note that when a collection of trees is given, this collection may actually be a 


multiset, because some trees could occur more than once (e.g. if various genes lead to the 
same gene tree). We refer to such a multiset of m (gene) trees as G m , and we denote with 
G the corresponding (simple) set which contains all trees of G m exactly once. When such a 
set of trees on the same set of species is given, their minimum deep coalescence tree, or 


MDC tree for short, can 


by 


De defined as the tree minimizing the number of so-called extra 


lin eages (IMaddisonl f 19971)1 . In order to define these extra lineages, we follow the approach 


Than and Nakleh (120091 ) and start by introducing the following mapping. For a given 


gene tree T and a given species tree S on the same taxon set X, we fit T into S as follows: 

1. Each taxon of T is mapped into the corresponding taxon in S. 

2. If v' is the most recent common ancestor of a cluster Ct(v) of T in S, and if v! is the 
parent node of v' in S, then v is mapped to some point on the edge (u 1 , v') except v!. 

3. If a node u is an ancestor of a node v in T, then for their images in S, say p u and p v , 
we have that p u is an ancestor of p v in S, too. 

This mapping is illustrated by Figure [H Here, a gene tree T is mapped into a 
species tree S. Only the mapping of the inner nodes of T is depicted by the dotted lines - 
all taxa are mapped into their respective counterparts of S. 

_ Now, g iven a gen e tree T and a species tree S and the above mapping, 


Than and N akleh ( 20091 ) define the number of extra lineages of a branch in S as the 


number of lineages of T that exit the branch (when looking backwards in time, i.e. towards 
the root) -1. The number of extra lineages of T on S is then simply the sum of extra 
lineages over all edges. In Figured] there are two extra lineages, which are highlighted by 
the small arrows. Here, looking backwards in time and leaving an edge of the species tree, 
at both arrows there are two dashed lineages of T present. As one lineage would be the 
ideal case, both edges with two lineages give an extra lineage, which means there are in 
total two extra lineages in this example. 

















T = T f 


S = Tx 



Figure 1: The mapping of the gene tree Tg from Figure [2] into the species tree 7\. All leaves 
of the gene tree are mapped into the corresponding leaves of the species tree; for all inner 
nodes the mapping is indicated by the dotted lines. In total, two extra lineages are needed 
and highlighted by the small arrows. 


Now these extra lineages play a fundamental role as MDC trees can be defined with 
their help: For a set G of gene trees, the tree S which requires the minimum number of 
extra lineages in total, i.e. when the sum over all trees in G is taken, is called MDC tree. 
Luckily, while it is hard to find an MDC tree, it is 
lineages of a tree S with the help of Theorem 2 of 


easy to calculate t 

he number of extra 

Than and Nakleh 

(2009 

): 


Theorem 1 (Theorem 2 of 


Than and Naklehl (120091 ) h Let T be a gene tree and S a species 


tree. Let (u',v') be an edge in S. Denote by ti,t 2 , ■ ■ ■ Hk all maximal clades of T such that 
Clftf) C Cs(v') fori e {1 ,...,k}. (Here, C(tf) denotes the leaf set of t t ). Then, the number 
of extra lineages in ( uv') is k — 1. 


This theorem gives a simple formula for calculating the number of extra lineages of 
each edge of S, and thus - by taking the sum over all edges - of S. We denote by l(T , S ) 
the minimum number of extra lineages a tree T needs in a species tree S and call l(T , S ) 
the MDC score of T in S. 

Simply put, for each branch we just have to count how many maximal clades of T 
have their leaf set contained in the cluster of S pending on the edge under investigation, 
and then take —1. The -1 is due to the fact that one lineage of the gene tree is needed per 
lineage of the species tree, but all additional lineages are extra lineages. In Figure Q] we 
















have already seen that the total number of extra lineages is 2. However, we could have 
derived this with the above theorem es follows: S has clusters (1, 2), (1, 2, 3) as well as the 
trivial clusters (1), (2), (3), (4) and (1,2, 3,4), which all trees on taxon set X = {1,2, 3,4} 
have. T, on the other hand, has clusters: (2,3), (2,3,4) and the trivial ones. Now for each 
cluster of S we check how many maximal clusters of T are contained in this cluster. Note 
that for the trivial clusters, the answer of course is always 1, because they are contained in 
both trees. So we focus on the other clusters and start with (1, 2). We find that (1) and (2) 
of T are contained in this cluster, so this gives k = 2 for the edge inducing cluster (1, 2). 
Similarly, (1, 2, 3) contains clusters (1) and (2, 3) and thus again, we have k = 2. So in total 
we have (2 — 1) + (2 — 1) = 2 extra lineages, which confirms our earlier result. 

Table [Q summarizes the number of extra lineages for all possible gene tree / species 
tree combinations on all trees on four taxa as depicted by Figure [21 Note that this matrix 
is not symmetric. For example, we have /(Ti,T 8 ) = 3 and /(T 8 ,Ti) = 2 (both highlighted in 
bold), so the roles of the gene tree and the species tree are not interchangeable. 

We now derive the following definition of MDC trees. 

Definition 1. A tree S is an MDC tree for a multiset G m = {Ti,..., T m } of gene trees on 
a taxon set X if and only if 


m 


S = argmin 
S'er 


I] Me) 

eeE ( S ') i =1 


argmin E l(Ti,S’) 


S’&T 


i = 1 


argmin N xt ■ l(T , S'). 
S’&T TeG 


Here, T denotes all rooted binary phylogenetic trees on taxon set X, E(S') denotes the 
edge set of tree S', kr, (e) denotes the number of extra lineages edge e of S needs when T) is 
mapped to S', G denotes the simple set induced by G m and xt denotes the number of 
times tree T is contained in G m . 


We are now in a position to state our results. 
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Table 1: Ranging over all trees from Figure [2j this table gives the corresponding MDC 
scores Z(T, S'), where T denotes the assumed gene tree and S the species tree. The values 
l(Ti, T 8 ) = 3 and Z(T 8 , T\) = 2 are highlighted in bold to demonstrate that this matrix is not 
symmetric. 





Results 


We will now introduce heredity of a species tree, which is the main concept of this paper. 

Definition 2. Let S be an MDC tree for a multiset G m = Ti,..., T m of gene trees. Now 
we consider the set G l m = {T|y : T e G and Y = X \ {i}}, i.e. the set containing all 
subtrees of the trees in G m which result from deleting leaf i. Then, S is called hereditary if 
there is an i E X such that iSlxyp} is an MDC tree for G l m . 

Informally speaking, an MDC tree S is hereditary if it contains at least one MDC 
subtree. This way, MDC trees could be traced back to smaller MDC trees by deleting one 
leaf at a time (however, it would not be clear which one to delete). We now state our main 
observation before we prove it subsequently. 

Observation 1. There exist gene trees T 1; ..., T m on a taxon set X such that their MDC 
tree S is unique, but such that S^y is not an MDC tree for T)|y,..., T m \ Y for all subsets Y 
with Y C X, \Y\ = \X\ — 1. This implies that removing one taxon from the analysis 
changes the topology of the MDC tree. 

Next we describe our approach of constructing an example to prove the observation. 
We believe that this approach can be useful for proving or disproving other statements 
regarding MDC. 

Note that by Definition [Tj a tree S is an MDC tree for G m — Ti, ..., T m , where all T % 

are trees on a common taxon set X , if it minimizes ^2 /(T, S') over all trees S' on X , so S 

Tec 

is an MDC tree if an only if 

m m 

/ (Tj, S) < /(Tj, S') for all trees S' ^ S on taxon set X. (1) 

i=l i= 1 

So for example, if we look at X = {1, 2, 3,4}, there are 15 rooted binary phylogenetic trees 
Ti,, T 15 , which are all depicted by Figure [2j If we want one of them, say T 1; to be an 


MDC tree, P gives us 14 inequalities which a multiset of gene trees G m would have to 
fulfill, because T\ has to be at least as good as any of the other 14 trees. If we want 7\ to 
be the unique MDC tree of G m , all inequalities in P are strict. 

By the same reasoning, however, if we do not want a tree to be an MDC tree, only 
one of the inequalities needs to fail (i.e. at least one other tree needs to give a strictly lower 
MDC score than the tree under investigation, but not necessarily all of them). 
Consequently, if we are searching now for a tree S on X which has no MDC subtree on any 
Y C X with \Y\ = |X| — 1, we require 

l(T{ \ Y , S\ Y ) > min ^ l(T 2 \ Y , S'|y) \ for all Y C X: \Y\ = \X\ - 1, (2) 

S €zT I 

i =1 L i =1 ) 

where T denotes the set of all binary phylogenetic trees on taxon set X. So for a 
given set X of taxa, it remains to check if there is a tree S which fulfills both P and (pi) 
simultaneously. We answer this affirmatively for X = {1,2, 3,4} with the following 
example. 

Example 1. We consider the case X = {1,2, 3,4}, for which 15 trees exist as depicted by 
Figure [21 We consider the multiset G rn which consists of 11 copies of T 2 , 10 copies of T 4 , 2 
copies of T 6 , 3 copies of T 7 and 3 copies of Ti 5 , i.e. we have x 2 = 11, x 4 = 10, x$ = 2, 
x 7 = 3, X 15 = 3 and Xi = 0 for all other i. Note that G m contains no copy of T 4 , but still T\ 
can be shown to be the unique MDC tree of G m . This can be seen by looking at Tabled 
whose 3 rd column contains the MDC scores of all trees — T\ gives the unique minimum 
value. This shows why MDC is truly just a consensus method: ft does not give back the 
majority tree, but instead it gives a compromise of the input trees, which may well not be 
contained in the data. However, in this example, all subtrees of T 4 are not MDC trees: 
When considering G m \y with Y C X and |Ej = 3, the corresponding subtree of T 4 is not 
MDC for Y, as can also be seen in Table [2] Note that our construction of G m with 



Figure 2: All 15 rooted binary trees on taxon set X = {1,2, 3,4}. The tree shape of trees 
Ti 3 , T 14 and T 15 is called balanced. Note that there are only three balanced trees, but twelve 
unbalanced ones. 









Figure 3: We consider all trees from Figure [2] on taxon set X = {1,2, 3,4}. Now we restrict 
the taxon set on Y = {1,2,3}, i.e. Y is derived from X by deleting taxon 4, and consider 
all possible rooted trees on Y. This leads to T \, T 2 and T 3 . We repeat this for the subsets Y 
(deletion of leaf 3), Y (deletion of leaf 2) and Y (deletion of leaf 1) and receive the possible 
trees depicted in lines 2 - 4 of the above table. 







\G m \ = 29 as summarized in Table [2] is minimal in the sense that there is no smaller set G m 
on 4 taxa with the same properties, i.e. such that 7\ is uniquely MDC but non of its 
subtrees is MDC and that additionally 7\ is not contained in G m . We verified this by an 
exhaustive search through all hypothetical smaller solutions with the help of a computer 
algebra system (calculations not shown). 

Example 2. In order to show that our result in Example 1 does not depend on the tree 
shape, we constructed the following example: G m now consists of 2 copies of T), 2 copies of 
T 12 , 1 copy of X 14 and 1 copy of Ti 5 . Here, I\ 3 , whose tree shape is balanced (as opposed to 
that of Ti as considered in the first example) is the unique MDC tree which again has no 
MDC subtrees. Table [3] summarizes these findings. Moreover, note that 7 \ 3 is not 
contained in G m . As with Example 1 , this construction with \G m \ = 6 is minimal: There is 
no smaller set G rn such that T 13 is the unique MDC tree, but has no MDC subtrees and is 
not contained in G m . However, when comparing Example 2 with Example 1, we conclude 
that while such constructions are possible for both tree shapes on 4 taxa, those for the 
more balanced tree shape require fewer trees in G m than for the other tree shape. 

Example 3. While we find Example 1 and 2 interesting exactly because the MDC tree 
under consideration does not occur in G m at all, this might lead to the wrong conclusion 
that the non-heredity is caused by this fact. Therefore, we repeated our calculations and 
searched for a case where the opposite is true, namely that T) is the unique MDC tree and 
has no MDC subtrees but is also the most frequent tree in G m . Table [4] summarizes a 
minimal example for this scenario. Here, G m consists of 8 copies of Ti, 6 copies of Ti, 1 
copy of T 3 , 7 copies of T 4 , 7 copies of T 6 and 4 copies of Ti 5 . Here, \G m \ = 33, so this 
example requires slightly more trees in G m than Example 1, where T\ was not contained in 
G m . This might be due to the fact that if the support for T\ is stronger than that for any 
other tree in G m , then the signal induced by Ti for the subtrees is also strong, which is why 


more trees are needed to annihilate this signal of the subtrees. However, this example is 
also interesting for another reason: Here, not only are the subtrees of T) each outperformed 
by some subtrees of the other T, but there is in fact one tree, namely T 6 , of which all 
subtrees outperform the corresponding subtrees of T\ concerning MDC, but still T 6 is 
outperformed by T\. So the signal given by the trees of size 3 leads to an entirely different 
conclusion than the signal induced by the trees of size 4. 

Example 4. Our last example repeats the construction of Example 3 for the case that the 
unique MDC tree is T 13 , and thus the tree shape is the balanced one. Here, G rn consists of 
3 copies of Ti, 1 copy of Ti 0 , 2 copies of 7i 2 , 1 copy of T u and 1 copy of T\ 5 . The total 
number of trees in G m of 8 is minimal in the sense that for a smaller multiset of trees, it is 
not possible that T 13 is the unique MDC tree with no MDC subtree and such that T 13 
occurs most frequently in G m . Again, as in the comparison of Example 1 and Example 2, 
we realize that we need significantly fewer trees in order to construct such an example for 
the balanced tree shape than for the other one. Moreover, considering Example 2, we 
conclude that we need slightly more trees in G rn if we make the signal for T 13 strong than if 
T 13 is not present at all. This coincides with the findings explained in Example 3 for the 
other tree shape. 


Discussion 

We have shown that MDC is not hereditary in the sense that MDC trees need not have any 
MDC subtrees or MDC triples (note that in our examples with X = {1,2,3,4}, all subtrees 
of size | X | — 1 are triples, so our examples show both non-heredity for subtrees as well as 
for triples). We showed that this is true for different tree shapes and regardless of the fact 
if the MDC tree is contained in the input tree set or not. Our proof used the technique of 
regarding MDC trees simply as the solution of a system of inequalities, which can be easily 
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Table 2: Example 1: Here, we consider a multiset of trees G m , which consists of 11 copies 
of T 2) 10 copies of T 4 , 2 copies of T 6 , 3 copies of T 7 and 3 copies of T 15 . The total number of 
29 is minimal, i.e. G m is a minimal multiset, such that T\ is the unique MDC tree but has 
no MDC subtrees of size 3 and 7\ G m . 


calculated using the formula on 
into a species tree presented by 


counting th e extra 


T han and Naklch (2009). 


lineages needed by a tree to be placed 


However, it is desirable that a method estimating the species tree of a given set of 
gene trees be hereditary, as the question whether or not a certain taxon is contained in the 
analysis should not alter the relationship of the remaining taxa. In this sense, non-heredity 
can be regarded as a drawback of a method. 

Our result shows th at MDC, w hich is sometimes regarded as the parsimony method 


of gene tree reconciliation (IMaddisonl (119971 )1. suffers from this drawback as does Maximum 


Parsimony in phylogenetic tree reconstruction dFischerl (120121 )). Also, the technique to 


prove the non-heredity of MDC could also be adapted from an analogous technique for 
Maximum Parsimony, namely solving a system of inequalities^_ 


On the other hand, non-heredity can also be helpful: 


Deg nan and Rosenberg ( 12006! ) 











































Ti 

Xi 

l ( Gm , Ti ) 

£(Gm|{l,2,3}Ti|{l,2,3}) 

l { Gm \ {1,2,4} Til {1,2,4}) 

l(Gm | { 1 , 3 , 4 }, Ti|{i i 3 i 4}) 

/(Gm|{2,3,4}Ti|{2,3,4}) 

T x 

2 

10 

4 

4 

3 

3 

t 2 

0 

12 

4 

4 

5 

5 

t 3 

0 

11 

5 

4 

3 

3 

T. 4 

0 

13 

5 

5 

3 

4 

t 5 

0 

13 

4 

5 

5 

5 

T 6 

0 

13 

5 

5 

5 

4 

t 7 

0 

11 

3 

4 

3 

3 

T 8 

0 

11 

3 

3 

4 

3 

t 9 

0 

13 

4 

3 

5 

5 

T w 

0 

11 

3 

3 

4 

5 

T11 

0 

12 

5 

5 

4 

4 

Tl 2 

2 

10 

3 

3 

4 

4 

CO 

0 

8 

4 

4 

4 

4 

T14 

1 

10 

5 

3 

3 

5 

T15 

1 

10 

3 

5 

5 

3 


Table 3: Example 2: Here, we consider a multiset of trees G m , which consists of 2 copies of 
T \, 2 copies of T 12 , 1 copy of T 14 and 1 copy of T 15 . The total number of 6 is minimal, i.e. 
G m is a minimal multiset, such that T 13 is the unique MDC tree but has no MDC subtrees 
of size 3 and Ti 3 ^ G m . 


showed that the most likely gene tree on a taxon set X always coincides with the species 
tree for |X| = 3, but that this does not in general hold if |X| > 4. This implies that taking 
the tree suggested by the majority of gene trees as an estimate for the species tree works 
for triples, but not necessarily for larger trees. Therefore, the authors suggest to use gene 
triples and to construct the species tree estimate from these triples by combining them into 
one common supertree (however, it is well known that there are then other problems like 


i ncom patibility of input trees or non-uniqueness of the supertree (c.f. 


Steel and Sanderson 


(BOlOf l)). So this idea basically says that while estimating the entire species tree by using 


the majority tree can go wrong, estimating the triples by majority is always correct. In this 
sense, majority estimates are non-hereditary, too, because the (correct) subtree solution 
differs from th e (possib ly incorrect) solution on the entire taxon set - and 


Degnan and Rosenberg (120061 1 suggest to use this knowledge of non-heredity as a means to 































T 

Xi 

l(Gm,Ti ) 

i(Gm|{l,2,3}Ti|{1.2,3}) 

l(G m \ {1,2,4} Til {1,2,4}) 

KGm|{l,3,4}> Ti|{l,3,4}) 

l(Gm | {2,3,4} ? T | {2,3,4}) 

T 

8 

50 

19 

18 

17 

20 

T 

6 

54 

19 

18 

16 

27 

t 3 

1 

56 

18 

18 

17 

20 

T 

7 

58 

18 

15 

17 

19 

T 

0 

57 

19 

15 

16 

27 

T 

7 

55 

18 

15 

16 

19 

T 

0 

60 

29 

18 

17 

20 

T 

0 

91 

29 

33 

33 

20 

t 9 

0 

68 

19 

33 

16 

27 

To 

0 

95 

29 

33 

33 

27 

Ti 

0 

66 

18 

15 

33 

19 

t 2 

0 

95 

29 

33 

33 

19 

T 3 

0 

52 

19 

18 

33 

19 

Tu 

0 

58 

18 

33 

17 

27 

T 5 

4 

51 

29 

15 

16 

20 


Table 4: Example 3: Here, we consider a multiset of trees G m , which consists of 8 copies of 
T, 6 copies of T 2 , 1 copy of T 3 , 7 copies of T 4 , 7 copies of T 6 and 4 copies of T 15 . The total 
number of 33 is minimal, i.e. G m is a minimal multiset, such that T is the unique MDC 
tree but has no MDC subtrees of size 3 and such that the number of T in G rn is strictly 
larger than that of any other Ti. Interestingly, all subtrees of T 6 are strictly better than the 
subtrees of Ti, but Ti’s MDC score of 50 is still slightly lower than that of T 6 , which is 55. 


























Ti 

Xi 

l(Gm,Ti) 

i(Gm|{l,2,3}Ti|{l,2,3}) 

l(G m \ {1,2,4} Til{1,2,4}) 

KGm|{ 1 , 3 , 4 } > Ti|{l,3,4}) 

l(Gm | {2,3,4} ? 1 {2,3,4}) 

T\ 

3 

13 

5 

5 

4 

4 

t 2 

0 

15 

5 

5 

7 

6 

t 3 

0 

15 

7 

5 

4 

4 


0 

19 

7 

7 

4 

6 

t 5 

0 

17 

5 

7 

7 

6 

T e 

0 

19 

7 

7 

7 

6 

t 7 

0 

15 

4 

5 

4 

4 

t 8 

0 

15 

4 

4 

5 

4 

t 9 

0 

16 

5 

4 

7 

6 

T w 

1 

14 

4 

4 

5 

6 

T u 

0 

18 

7 

7 

5 

6 

T\ 2 

2 

14 

4 

4 

5 

6 

T Vi 

0 

11 

5 

5 

5 

6 

T u 

1 

13 

7 

4 

4 

6 

Tu 

1 

14 

4 

7 

7 

4 


Table 5: Example 4: Here, we consider a multiset of trees G m , which consists of 3 copies of 
Ti, 1 copy of Tio, 2 copies of Ti 2 , 1 copy of T u and 1 copy of T\ 5 . The total number of 8 is 
minimal, i.e. G m is a minimal multiset, such that T 13 is the unique MDC tree but has no 
MDC subtrees of size 3 and such that the number of Ti 3 in G m is strictly larger than that 
of any other Ti. 


























overcome the problem of a wrong majority tree estimate by considering triple subtrees 
instead. 

For MDC, however, it is not so clear if non-heredity has any advantages which could 
lead to an improved tree reconciliation method, as it is (unlike in the majority scenario) 
not a priori clear if the subtree estimate or the entire tree estimate is better. This gives rise 
to a variety of research questions which will be considered in forthcoming papers. 

* 
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